Back

G3: Genes, Genomes, Genetics

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match G3: Genes, Genomes, Genetics's content profile, based on 222 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
The phenotypic nonspecificity of cell-to-cell signalling in Drosophila melanogaster.

Percival-Smith, A.; Brabrook, C.

2026-05-21 genetics 10.64898/2026.05.19.726339 medRxiv
Top 0.1%
8.6%
Show abstract

An expectation of a hypothesis that proposes cell-to-cell signalling pathways are redundant due to the redundancy of pathway terminal transcription factors (TFs) was tested by screening 35 signalling ligands (SLs) for rescue of a decapentaplegic (dpp) hypomorphic wing growth phenotype. The screen identified three examples of partial rescue: Hedgehog (HH), Semphorin 1a (SEMA1A) and Wnt ortholog 2 (WNT2). HH overexpression with dppGAL4 may increase the expression of DPP activity from the hypomorphic dpp alleles. However, SEMA1A and WNT2 did not phenocopy ectopic expression of HH or DPP and neither SEMA1A nor WNT2 were required for wing growth suggesting substitution of DPP for partial restoration of wing growth. The WNT2 rescue was dependent on the Frizzled 4 (FZ4) WNT receptor excluding the possibility that WNT2 weakly binds the DPP receptor. Although examples of phenotypic nonspecificity of SL function were identified, this is an expectation, and not direct proof, of the hypothesis of TF redundancy. Screen Report SummaryAn expectation of a hypothesis proposing that cell-to-cell signalling pathways are redundant due to the redundancy of the pathway terminal transcription factors was tested by screening for replacement of one signalling ligand (DPP; SLa) with another SLb for wing growth. Three non-DPP SLs were identified in the screen of 35SLs: HH, SEMA1A and WNT2. Genetic analysis of Sema1a and Wnt2 suggests functional complementation of dpp for wing growth suggesting that SEMA1A and WNT2 partially replace DPP for wing growth. Therefore, an expectation of the hypothesis is met.

2
Near chromosome-level genome assembly for the invasive annual forb Centaurea melitensis

Dant, A.; Pelosi, J.; Northing, P. C.; Dlugosch, K. M.

2026-05-20 genomics 10.64898/2026.05.18.726060 medRxiv
Top 0.1%
8.5%
Show abstract

PremiseCentaurea melitensis (Asteraceae) is a problematic invader of grasslands globally, but little is known about its genetic makeup. Here we develop a reference genome to facilitate studies of its invasion history, genetic variation, and evolution. MethodsInbred offspring of a single individual of C. melitensis from its invasion of California, USA were used for flow cytometry to estimate genome size, and for genomic DNA extraction. DNA was sequenced with PacBio HiFi technology (yield = 85.7Gb). The genome was assembled with Hifiasm and annotated with BRAKER3. GENESPACE was used to compare gene order (synteny) with three other species within the subfamily Cichorioideae. ResultsWe estimated a mean genome size of 795.0 Mbp for C. melitensis, and our assembly totaled 696.6 Mbp in 48 contigs (N50 = 55.6 Mbp; BUSCO = 98%), with annotation of 25,157 protein-encoding genes. This included four telomere-to-telomere putative chromosomes, nine additional chromosome arms terminated by telomeric repeats, and a complete chloroplast genome. Synteny varied markedly across the genus and subfamily, suggesting a dynamic history of structural variation in the lineage of C. melitensis. DiscussionWe provide a highly complete and contiguous genome assembly to facilitate the further study of genomic variation in C. melitensis.

3
The impact of long-read sequencing on fungal genome assemblies: progress and disparity

Kroll, E.; Zoclanclounon, Y. A. B.; Urban, M.; Hill, R.; Hammond-Kosack, K. E.

2026-05-14 genomics 10.64898/2026.05.12.724544 medRxiv
Top 0.1%
6.7%
Show abstract

Fungal genomics has expanded rapidly over the past 30 years, and recently the pace and breath has further quickened for many taxa, although many taxonomic gaps persist. With three decades of rapid growth, fungal genomics now merits a re-examination of its history, progress, and unresolved taxonomic gaps. Here, we review the development of fungal genomics from early efforts such as the Fungal Genome Initiative to current progress driven by third-generation long-read sequencing. We have compiled and summarised publicly available fungal genomes to highlight trends in assembly quality, adoption of long-read technologies, and taxonomic representation. Notably, substantial phylogenetic gaps remain, particularly outside Dikarya, and significant challenges persist for unculturable taxa. This review identifies priorities for the fungal community, including: (1) coordinated efforts to close major taxonomic gaps across the fungal tree of life; (2) improved repository metrics to facilitate identification of high-quality assemblies; and (3) improved and standardised genome annotation which is lacking for most assemblies. Together, these steps will support the development of reliable genomic resources that capture the full breadth of diversity across the fungal kingdom, generating foundational data for comparative genomics, evolutionary biology, functional studies, genetic studies and applied research.

4
Mapping of Stripe Rust and Leaf Rust Resistance Genes in the Hard Red Winter Wheat Population Green Hammer/Lonerider

Sharma, R.; Wang, M.; Chen, X.; Carver, B. F.; Guttieri, M.; St. Amand, P.; Bernardo, A.; Bai, G.; Liu, S.; Ara, A. M.; Aoun, M.

2026-05-15 genetics 10.64898/2026.05.13.724876 medRxiv
Top 0.1%
6.3%
Show abstract

Stripe rust and leaf rust, caused by Puccinia striiformis f. sp. tritici and P. triticina, respectively, are the most destructive wheat diseases in the southern Great Plains. Green Hammer is a hard red winter wheat (HRWW) cultivar released by Oklahoma State University in 2018 and has demonstrated a stable adult plant resistance to stripe rust and race-specific seedling resistance to leaf rust. To identify and map rust resistance loci, 109 doubled haploid (DH) lines derived from the cross between Green Hammer and another HRWW cultivar, Lonerider, were developed. Lonerider showed adult plant resistance to stripe rust but was susceptible to multiple P. triticina races. The DH lines were evaluated for stripe rust at the adult plant stage in greenhouse and field environments across Oklahoma, Kansas, and Washington, and for leaf rust at the seedling stage against seven U.S. P. triticina races and at the adult plant stage in Oklahoma and Texas. Genotyping-by-sequencing generated 6,078 polymorphic single-nucleotide polymorphisms used for genetic mapping. Quantitative trait loci (QTL) analysis identified 14 stripe rust and 8 leaf rust resistance QTL. For stripe rust, a major QTL in Green Hammer, QYr.osughln-2AS, was identified in the proximity of the 2NvS translocation. Three other major stripe rust resistance QTL were identified in Lonerider on chromosomes 2AL (two QTL) and 2BS (one QTL). For leaf rust, QLr.osughln-1DS and QLr.osughln-2DS.1 were the two major QTL identified in Green Hammer and most likely correspond to the all-stage resistance genes Lr21 and Lr39, respectively. In this study, we identified previously characterized genes as well as unknown genes that can be utilized in wheat breeding programs to enhance resistance to leaf rust and stripe rust.

5
Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv
Top 0.1%
4.9%
Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

6
Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv
Top 0.1%
4.8%
Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

7
Transcriptomic profiling of the embryonic C. elegans intestine with single-cell resolution

Hill, J. L.; Ellis, J. P.; Williams, R. T.; Apodaca, A.; Basu, A.; Moore, A.; Osborne Nishimura, E.

2026-05-22 genetics 10.64898/2026.05.20.726538 medRxiv
Top 0.1%
4.8%
Show abstract

At a mere 20 cells, the Caenorhabditis elegans intestine regulates metabolism, energy homeostasis, host defense, yolk production, and genetic aging, all while dynamically responding to its environment. How the intestine develops to carry out these disparate functions is unknown, and how cells differ along the length of the intestine is unclear. To address these questions, we performed single-cell RNA sequencing (scRNA-seq) on FACS-enriched intestinal cells from mixed-stage C. elegans embryos. The resulting single-cell transcriptomes of 974 cells organized into 13 clusters, suggesting a diversity of cell types and states. We used two post hoc approaches to ascribe identities to each cluster. First, genes with known developmental timing in early-, mid-, and late-stages were used to place clusters in time, and smiFISH microscopy was used to fine-tune the assignments. Second, the eight late-stage clusters were assessed for their region of origin. To assign these clusters to anatomical regions, we identified marker genes for each cluster and assessed their expression along the anterior-to-posterior length of the intestine using smiFISH microscopy. Genes associated with growth and cell division were expressed in early stages, whereas genes associated with immune responses and metabolism were expressed later. Genes associated with biotic responses and RNA metabolism were the most likely to vary across the intestines anterior-posterior axis. Finally, perturbation of anterior-localized intestinal transcripts more robustly affected intestinal function compared to central or posterior-localized genes. Overall, this research illustrates the intrinsic heterogeneity across the 20 cells of the embryonic intestine and sets the stage for future works aimed at understanding cell-specific intestinal responses to diet and the environment. ARTICLE SUMMARYWe investigate how the Caenorhabditis elegans intestine develops specialized functions on a spatiotemporal scale. We used single-cell RNA-sequencing to analyze embryonic intestinal cells and identify 13 distinct clusters. Combining gene expression analysis with microscopy, we assigned clusters to developmental stages and anatomical regions. Clusters associated with early intestine development express genes linked to growth and cell division, while later-stage clusters express genes involved in metabolism and immune responses. Genes varied across the intestines anterior-to-posterior axis, and disrupting anterior-specific genes produced stronger functional effects. These findings reveal previously unrecognized intestinal diversity and provide insight into how intestinal cells specialize during development.

8
An exact formula for the contribution of sampling error to r2, a common measure of linkage disequilibrium

Waples, R. S.

2026-05-21 evolutionary biology 10.64898/2026.05.19.726388 medRxiv
Top 0.1%
3.6%
Show abstract

Interest in quantifying linkage disequilibrium (LD, non-random associations of alleles at different loci) has skyrocketed in recent years as researchers have focused on use of LD in genome-wide association studies (GWAS), for studying historical demography, and for estimating effective population size (Ne). The most widely used LD metric is r2 = the squared correlation of alleles at a pair of loci. Despite a half century of efforts, developing an unbiased expectation of r2 as a function of the many factors that can affect it (physical linkage, genetic drift, selection, migration, mutation, mating systems) remains elusive. Furthermore, even when all of these other factors are absent, empirical estimates of r2 are upwardly biased by sampling a finite number (S) of individuals, and that must be accounted for if one wants to focus on the desired signal of LD. Previous approaches to estimate [Formula] have been shown to be biased to greater or lesser degrees. The purpose of this short paper is to demonstrate that a simple and apparently exact expression for [Formula] does exist for the special case where sampling error is the only factor contributing to r2, in which case [Formula] = 1/(S - 1). When other factors contribute heavily to LD, [Formula] shrinks toward 0 as empirical r2 [->] 1. However, for estimating contemporary Ne with unlinked markers, empirical r2 will generally be small and 1/(S - 1) will provide a robust estimate of [Formula].

9
A gap-free, telomere-to-telomere genome assembly for the Caenorhabditis briggsae reference strain AF16

O'Connor, L. M.; Moya, N. D.; Jhaveri, N. S.; Tanny, R. E.; Khorshidian, A.; Lyu, H.; Chamberlin, H. M.; Baird, S. E.; Andersen, E. C.

2026-05-04 genomics 10.64898/2026.04.30.721887 medRxiv
Top 0.2%
3.5%
Show abstract

The nematode Caenorhabditis elegans was the first metazoan to have its genome completely sequenced and assembled. Since that time, researchers have continuously updated the reference genome and manually curated its approximately 20,000 genes. The closely related species, Caenorhabditis briggsae, has served as a comparative model because of its similar morphology, mode of reproduction, and patterns of intra-species genetic variation. However, the genomic resources for C. briggsae lag behind C. elegans, hindering comparative genomics studies between the species. Decades of experimentation have been performed in the AF16 reference strain genetic background, so we obtained high-coverage long-read sequencing and high-throughput chromosome conformation capture data to create an updated reference genome for an isogenic derivative of AF16, named CGC2. The CGC2 genome is vastly improved relative to the existing AF16 assemblies, with no unplaced sequence, no gaps, and telomere-to-telomere contiguity. To provide genomic resources for CGC2, we exploited deep RNA-seq libraries from all developmental stages to predict protein-coding gene annotations comparable in accuracy and completeness to the existing AF16 gene models. We also performed lift-over of 108 validated insertion-deletion variants to the updated coordinate system of the CGC2 genome to facilitate future mappings of mutations. In summary, we present an updated reference genome for the canonical AF16 reference strain with improved genomic resources to enable high-quality intra- and inter-species comparative studies.

10
Characterization of genetically effective cells and EMS mutagenesis on the novel winter oil seed Pennycress (Thlaspi arvense)

Brusa, A.; Branch, C.; Sulivan, L.; Chopra, R.; Rai, K.; Rockstad, G.; Gjesvold, E. S.; Ott, M.; Jain, S.; Biel, C. C.; Marks, M. D.

2026-05-05 genomics 10.64898/2026.04.30.722012 medRxiv
Top 0.2%
3.0%
Show abstract

Pennycress (Thlaspi arvense L.) is an intermediate winter oilseed crop that has only recently been domesticated for agronomic use. Improving agronomic traits requires sources of genetic variation, and mutagenesis is frequently used to help overcome the limitations of natural populations. We investigate the impact of Ethyl methanesulfonate (EMS) on genetically effective cells (GECs) to characterize the intra-individual genetic variation of EMS mutagenesis in pennycress. We identified that pennycress contains at least 4 GECs which, when treated with EMS, create unique mutations across different branches within the same individual plant. We then propagated the M2 plants for whole genome sequencing, providing extensive characterization of the EMS mutation profile and developing a gene index as a resource for future reverse genetic screenings. Article SummaryPennycress is an emerging winter oil seed crop in the American Midwest. Domestication efforts have advanced rapidly through a combination of genetic techniques. One of the most successful methods has been the use of a mutant gene index, a large collection of pennycress seed where new genetic variation has been created through Ethyl methanesulfonate (EMS). EMS mutations are not uniform however, and a single treated seed can have wide genetic variation within the resulting plant. We investigate the role of genetically effective cells on EMS variation, and present the full EMS population as a resource for further pennycress domestication efforts.

11
Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

Marone, M. P.; Chen, E.; Himmelbach, A.; Haberer, G.; Spannagl, M.; Stein, N.; Mascher, M.

2026-05-18 genomics 10.64898/2026.05.13.724976 medRxiv
Top 0.3%
2.0%
Show abstract

BackgroundAs pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. ResultsWe applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. ConclusionsBy transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.

12
Functional genomic map of local adaptation in sorghum to guide allele mining

Xu, Y.; Das, A.; Cruet-Burgos, C.; Morris, G. P.; Lasky, J. R.

2026-05-18 evolutionary biology 10.64898/2026.05.17.725773 medRxiv
Top 0.4%
1.7%
Show abstract

Genomic data from genebanks could be exploited to find alleles adapted to target environments for resilience breeding, but it can be difficult to prioritize among the thousands of accessions and millions of genomic variants. There are competing hypotheses for the molecular basis and architecture of local adaptations: e.g. whether cis-regulatory versus amino acid changing variants are more important; or whether small-effect, low pleiotropy versus large-effect, high pleiotropy variants are more important. Here, we compare a range of variant types and genomic contexts thought to influence effect size, pleiotropy, and selection for their role in local adaptation in 443 whole genome resequenced African sorghum landraces. We used genotype-environment associations (GEAs) as evidence of local adaptation. We found that GEA were particularly enriched in the vicinity of genes and depleted elsewhere. However, enrichment was strongest in likely cis-regulatory contexts: accessible chromatin, unmethylated regions, and in transposable elements close to genes. Near genes, there were clear peaks in GEAs at the transcription start site, where mutations are demonstrated to have the largest expression effects. Additionally, GEAs in accessible chromatin and unmethylated regions were better predictors of genetic variation in response to experimental drought than comparable loci. Having tested hypotheses about the variants underlying local adaptation, we can now use this knowledge of the importance of cis-regulatory variation in the search for new environmentally-adaptive alleles for plant improvement.

13
Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv
Top 0.4%
1.7%
Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

14
Environmental impacts on gene expression noise and its relationship with fitness

Haque, T.; Siddiq, M. A.; Duveau, F. M.; Wittkopp, P.

2026-05-18 evolutionary biology 10.64898/2026.05.18.725919 medRxiv
Top 0.4%
1.7%
Show abstract

Genetically identical cells grown in the same environment show variation in gene expression known as expression noise. Expression noise can be heritable and impact fitness, making it subject to natural selection. Increasing expression noise for the Saccharomyces cerevisiae TDH3 gene was shown to be beneficial in glucose-based media when mean TDH3 expression was far from the fitness optimum but deleterious when it was close to this optimum. Here, we show that growth on different carbon sources alters the effects of new mutations on TDH3 expression noise and examine the fitness effects of changing expression noise. In galactose-based media, we observed the same relationship between expression noise and fitness seen in glucose-based media, but in glycerol- and ethanol-based media, we observed the opposite relationship or no significant relationship, respectively. Using simulations of single-cell organisms, we found that these differences were most likely explained by environment-specific relationships between gene expression and fitness. We also found that, far from the optimum, the fitness effects of noise were greatest when expression was highly heritable between mother and daughter cells. The empirical observations and simulations reported in this study show how environments influence both the production of expression noise and its impacts on fitness.

15
A songbird karyotype: cytogenetic confirmation of a migration-associated region rich in olfactory receptor genes.

Caballero Lopez, V.; Dedukh, D.; Ekman, D.; Kauzal, O.; Lundberg, M.; Odenthal-Hesse, L.; Proux-Wera, E.; Reifova, R.; Reif, J.; Altmanova, M.; Trifonov, V.; Bensch, S.

2026-05-07 genomics 10.64898/2026.05.04.721007 medRxiv
Top 0.4%
1.7%
Show abstract

The field of genetics of bird migration advances, driven by exponential refinements of sequencing and tracking technologies. In willow warblers (Phylloscopus trochilus), a complex repeat-rich region named MARB (Migration Associated Repeat Block) has recently been found to correlate with the routes taken by individual birds from Europe to their African wintering grounds. However, the genomic location of this region remains unknown. Here, we characterized MARB using a combination of approaches to understand how it evolved. We describe the region using long-read genome assemblies of two willow warbler subspecies (P. t. trochilus and P. t. acredula), two related species, the common chiffchaff (P. collybita) and the greenish warbler (P. trochiloides), and whole genome sequencing data from 76 willow warblers. Finally, we applied karyotyping and fluorescent in situ hybridization techniques on willow warbler spermatocytes to cytogenetically locate MARB. Due to the many repeats, we cannot order scaffolds in silico, but probe hybridization on the karyotype shows that MARB constitutes a single locus (~27.5 Mb) spanning most of the 11th largest chromosome in the willow warbler genome. Interestingly, the MARB regions of all species share several characteristics such as relatively high GC content (50%), a high density of specific repeat families and notably, more than 800 olfactory receptor sequences. Regions homologous to MARB may exist in several migrant bird genomes, though currently unassembled due to their complexity. Resolving these in species with similar migratory polymorphisms to willow warblers will be essential to determine whether MARB influences migratory behaviour across species.

16
The stability of fatty acid composition in sunflower oil is dependent on environment and affected by structural variation

Ingold, M.; Gao, Q.; Mandel, J. R.; McNellie, J. P.; Keepers, K. G.; Barb, J. G.; Burke, J. M.; Rieseberg, L. H.; Hulke, B. S.

2026-05-07 plant biology 10.64898/2026.05.04.722759 medRxiv
Top 0.4%
1.7%
Show abstract

In sunflower (Helianthus annuus L.), the composition of fatty acids in the seeds, primarily oleic, linoleic, stearic and palmitic acid, is of utmost importance for oil quality. Despite this, the genetic basis of this trait and its interaction with the environment is poorly understood. Understanding this interaction is critical to improvement of sunflower within the context of climate change. In this work, we incorporated fatty acid composition measurements from the sunflower SAM population and eight environments across an extensive geographic cline into GWAS. The SAM panel consists of 287 varieties representing approximately 90% of sunflower diversity, for which 2.2 million high-quality SNPs with a MAF > 5% are available. For increased power, multivariate GWAS was performed with four different inputs: (i) mean fatty acid composition within each environment, (ii) mean fatty acid composition within each environment omitting high oleic varieties, (iii) trait stability within environments quantified by standard errors among replicate samples ( stability) and (iv) Eberhart and Russells {beta} which quantifies trait stabilities across environments ({beta} stability). All four analyses yielded highly significantly associated SNPs. We found that high oleic varieties exhibited high {beta} trait stability, resulting in substantial overlap in markers between analyses (i) and (iv), with signals being fairly consistent between environments in analysis (i). For analyses (ii) and (iii), significant markers tended to vary between trials. For significant SNPs across all analyses, 147 candidate genes were identified, including promising candidates such as 15 fatty acid metabolism genes, 6 heat shock proteins and 22 transcription factors. Lastly, a large introgression consisting of two flanking inverted sequences on Chromosome 5 was found to coincide with stability in the Georgia trial, suggesting a role in FA composition stability under high heat conditions.

17
The contribution of non-additive genetic effects to the genetic variance of polyploid species.

Clo, J.

2026-05-14 genetics 10.64898/2026.05.12.724556 medRxiv
Top 0.4%
1.7%
Show abstract

Whole genome duplication is a common mutation in eukaryotes with far-reaching phenotypic effects. The resulting morphological, physiological, and fitness consequences and how they affect the survival probability of newly polyploid lineages are intensively studied, but very little is known about the effect of genome doubling on the short-term evolvability of populations. Understanding the effect of polyploidization on the adaptive potential of populations is of crucial importance to predict the future of polyploid populations. In this paper, I investigate the immediate consequences of genome doubling on the genetic variance of populations. To do so, I performed numerical iterations and simulations of how the genetic variance of a quantitative trait changes after polyploidization, under different genetic architectures (additivity, dominance, and epistasis). I found that genetic variance generally decreases after genome doubling. Non-additive gene actions can make autotetraploid populations genetically more diverse than their diploid progenitors in rare cases, notably with overdominance and directional epistasis. By collecting estimates from the agronomic literature, I found that both dominance and epistatic variance contribute to the genetic variance of polyploid populations. These results bring new insights into the adaptive potential of newly formed tetraploid populations, and call for further experimental investigations of how polyploidization is associated with a short-term decrease in evolvability.

18
Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology 10.64898/2026.05.03.722538 medRxiv
Top 0.4%
1.7%
Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

19
Reduction of Pollen Number and Anther Length in Bread Wheat Studied by a Nested Association Mapping Population

Hamaya, N.-B.; Kakui, H.; Okada, M.; Jilu, N.; Jung, K.; Nitta, M.; Wicker, T.; Keller, B.; Nasuda, S.; Shimizu, K. K.

2026-05-23 plant biology 10.64898/2026.05.22.727104 medRxiv
Top 0.4%
1.7%
Show abstract

The number of pollen grains, which carry male gametes in seed plants, has attracted interest in genetics, evolution, and breeding. Rapid evolutionary reductions in pollen number and anther length were reported in selfing species as well as domesticated species, although this poses a challenge for hybrid breeding. Here, we studied the variation of pollen number and anther length of the hexaploid bread wheat (Triticum aestivum) by employing a quick pollen counting method. Pollen numbers in cultivars were lower than those in landraces among 54 lines of diverse geographic origins. Using the year of registration of traditional and modern cultivars, we found a reduction in pollen number over the past 150 years. We detected high heritability and variation among Asian landraces and cultivars. Thus, we conducted QTL mapping of pollen number as well as of anther length using nested association mapping lines in which Norin 61 was the common parent. Genomic loci encompassing Green Revolution genes (Rht-B1, Rht-D1, and Ppd-D1) showed significant effects on pollen number and anther length, but their contributions were relatively minor. Although anther length has often been used as a proxy for pollen number in bread wheat, our data showed that their correlations are not necessarily high. Interestingly, we identified a new QTL of pollen number that was not detected by measuring anther length, and, vice versa, a new QTL specific to anther length. These data suggest that pollen number has reduced rapidly in bread wheat but can be modified using the genetic diversity of landraces. Significance statementWe found that modern cultivars of bread wheat have reduced pollen number and shorter anther length, which are common in domesticated species but can be a challenge for hybrid breeding. Using underutilized Asian landraces and cultivars, we reported that new quantitative trait loci as well as loci used in the Green Revolution, are responsible for the traits, which can be employed to increase pollen numbers.

20
Historically Small Population Size Limits Purging of Deleterious Mutations in a Conservation-Reliant Species, the Kirtlands Warbler

Calderon, A. M.; Salis, A. T.; Toews, D. P. L.; Szpiech, Z. A.

2026-05-16 evolutionary biology 10.64898/2026.05.15.725193 medRxiv
Top 0.5%
1.5%
Show abstract

Strong population contractions can leave a persistent genomic legacy that can influence populations long after their demographic recovery. While bottlenecks facilitate the removal of strongly deleterious mutations, the effectiveness of purging may be limited in historically small populations. The Kirtlands warbler (Setophaga kirtlandii) is a rare North American songbird with an ancestrally small population. After narrowly evading extinction, they are one of few species that have been delisted from federal protections in the USA. Despite their recovery, a previous study showed evidence for recent inbreeding and a high burden of deleterious mutations that may have not been purged despite strong bottlenecks. Historical DNA offers a unique opportunity to understand the consequences of recent demographic declines on genetic diversity. Here, we use DNA from over 100-year-old museum specimens to estimate changes in genetic load in the Kirtlands warblers pre- and post-bottleneck. We validate our results with forward-in-time genetic simulations and explore how sample size and missing data can affect estimates. Both empirical data and simulations suggest a reduced ability to purge deleterious mutations in this historically small population. Our simulations also highlight that limited sampling design and data quality can constrain the ability to detect changes.